THEORY OF ACCIDENT
::;V PRpNENESS ;
II. True or false contagion
BY
grace E. BATES aad JERZY NEYMAN
UNIVERSITy OP CAtlFOBNI^^ PUBLICATIONS IN STATISTICS
Volume 1, No. 10, pjp. 255-276
'if'i
■
;.\W R^SSSARCH SECTION):
JON 17 1952
ONIVERSITY OF CALIFORNIA PRESS
BERKELEY AND LOS ANQELES
19960731 046
XJfHC qUAOTT IKSPECTBD a
DnC QUALDT UiaPEOXlSD 4
lilSraiBUTO STATMtjfT
AF^roved fca public i«leeMa;
Siitzibutioa Unlindted
MMib
CONTRIBUTIONS TO THE
THEORY OF ACCIDENT
PRONENESS
II. TRUE OR FALSE CONTAGION
BY
GRACE E. BATES and JERZY NEYMAN
UNIVERSITY OF CALIFORNIA PRESS
BERKELEY AND LOS ANGELES
1952
IIniveesitt of California Publications in Statistics
Editors : J. Netman, M. Lo^ive, O, Struve
Volume 1, No. 10, pp. 255-276
Submitted bj editors November 7, 1951
Issued April 30, 1^52^'
University of California Press
Berkeley and Los Angeles
California
<>
Cambridge University Press
London, England
PRINTED IN THE UNITED STATES OF AMERICA
CONTEIBUTIONS TO THE THEORY OE
ACCIDENT PRONENESS
II. TRUE OR FALSE CONTAGION
BY
GRACE E. BATES AND JERZY NEYMAN
1. Introduction. The first part of the present paper [1]^ was concerned with the
theoretical aspect of the following practical question: can one use the number of
light accidents incurred by different individuals in the past to predict the number of
severe accidents in a hazardous occupation to be sustained in the future? The theo¬
retical assumptions underlying this study form an extension of the well-known
scheme due to Greenwood, Yule, and Newbold, An essential part of this scheme is
characterized by the postulates : (i) that the individuals of a population differ from
each other in accident proneness, {it) that the accidents already incurred do not
change the probabilities of further accidents in the future, and (in) that these prob¬
abilities stay constant in time and are not modified by the experience that the indi¬
vidual may gain in the particular occupations. These three postulates may be
symbolized by the combined term “mixture-no contagion-no time-effect model.
In order to be able to deal with two kinds of accidents, light and severe, the above
three postulates were supplemented by two more; (iv) that the expected number
of light accidents per unit of time is proportional to the expected number X of severe
accidents (this postulate was termed the fundamental hypothesis), and (y) that to
each severe accident there corresponds a fixed probability 6 that the individual
involved in the accident will survive.
The present Part II of the paper deals with a comparison between the foregoing
scheme of mixture-no contagion-no time-effect and an alternative scheme due to
Polya [2], P61ya^s scheme postulates (I) identity of the individuals with respect to
accident proneness — thus (I) is the denial of postulate (i) — , (II) possible presence
of contagion of a specified type, and (III) possible effect of experience gained since
entering the particular occupation. Curiously, as discussed by Lundberg [3] and
Feller [4 and 5], if one considers the number of accidents incurred in a single period
of time, its distribution implied by the mixture-no contagion-no time-effect model
coincides with the distribution implied by the Polya contagious scheme with the
additional assumption that prior to the period of observation all the individuals
concerned had the same number of accidents.
Naturally, no mathematical model of actual phenomena is ever absolutely exact.
However, it is an undeniable fact that some models fit the particular set of phe¬
nomena better than some others. In the present case a number of problems concerned
with personnel management make it important to distinguish between accidents
which do and those which do not show elements of contagion in the sense of Polya
and between those in which the time elapsed since entering the hazardous occupa-
The work on this paper, begun under contract with the School of Aviation Medicine, was com¬
pleted with the partial support of the Office of Naval Research. Dr. Bates, a member of the
faculty of Mount Holyoke College, worked at the University of California on this project,
^ Numbers in brackets refer to references at the end of the paper.
[ 255' ]
256
University of Calif ornia Pnhlicat ions in Statistics
tion has some effect or no effect on the probability of accidents. This is just the
problem studied in the following pages.
Using the scheme of Polya in a slightly generalized form we deduce the multi¬
variate distribution of the numbers of accidents of the same severe type incurred and
survived in several successive periods of observation and of the number of these
periods that are survived by the particular individual. This distribution is then
compared with a corresponding distribution implied by the mixture-no contagion-
no time-effect scheme. It is shown that, just as soon as the accidents are observed
in more than one period, equivalence of the two distributions forces a condition on
the parameters of the generalized Polya scheme so that, barring an exceptional par¬
ticular case, it is possible to distinguish between the two models.
As a by-product of this study we obtain the joint distributions of the number of
light accidents incurred in the past and of the number of severe accidents to be
incurred in the future, as implied by the postulates (^) through (v), which were given
in Part I without proof. These distributions are deduced separately for those indi¬
viduals who survive all the severe accidents incurred and separately for those who
succumb.
In the last section we outline what seems to be a more promising method of ap¬
proach to the problem of establishing the presence of contagion. This is based on the
study of the distribution of time intervals between successive accidents incurred
by particular individuals.
2. Basic assumptions. We shall consider an individual 1 who, from a certain
moment ^ = 0 is exposed to the risk of accidents of a specified kind, subject to five
postulates formulated below. The totality of these postulates will be denoted by
(P) and described as the generalized Polya contagious scheme or model. In formu¬
lating these postulates, it will be necessary to consider time intervals (0, Tf) and
(Pi, Tf) with 0 < Pi < P2. These intervals will be always considered open on the
left and closed on the right, say 0 < Pi and Pi < i ^ P2, where t stands for a
moment in time.
Postulate Pi. The individual I cannot die or otherwise cease to he exposed to acci¬
dents except as a result of an accident which may prove fatal.
Postulate P2. Whatever the time interval (Pi, P2) with 0 ^ Pi < P2, if the indi¬
vidual I is alive at Pi, the number of accidents^ say X(P3, P2), that he will incur and
survive in (Pi, P2) is a random variable whose distribution depends on Pi and P2 and on
the number of accidents incurred in the time interval (0, Pi), but not on the precise times
when these accidents took place.
Accordingly, we shall consider probabilities Pm.n{Ti, P2) and Qm,n{Ti, Tf) defined
as follows:
Pm,n(Pi, P2) is the conditional probability that during the time interval (Pi, Tf)
the individual / will incur exactly n accidents and that he will survive them all,
given that at time Pi he had incurred exactly m accidents and survived. If Pi = 0,
then the only acceptable value of m will be m = 0.
Qm,n{Ti, P2) is the conditional probability that during the time interval (Pi, P2)
the individual / will incur exactly n + 1 accidents, that he will survive the first n
and die in the {n + l)st, given that at time Pi he had incurred exactly m accidents
and survived. Again, if Pi = 0 then the only possible value of m will be m = 0.
Bates-Neyman: Accident Proneness. II
257
Obviously,
(1) Z {Pn..n (Tu T2) + Q„,„( Tl, T2)) ^ 1 .
Postulate P3. // P2 Pi all the 'probabilities P2) anc? Qw,n(Pi, P2)
converge to limits P„,,n(Pi, Pi) and Qm,niTi, Pi), respectively.
More specifically,
(2) P..o(Pi, Pi) = 1
for every m and, consequently, owing to (1)
(3) Pm,n{Tlj Pi) = Qm,n-l(Pl, Pi) = 0
for n ^ 1. The limits thus postulated will be interpreted as the probabilities of n
accidents, all survived or not, occurring in the interval of time of zero duration.
Postulate P4. To each accident there corresponds a fixed probability B of surviving it.
Consequently,
(4) Qm,o(Pl, P2) = (1 ~ 0) [1 — Pm,o(Plj P2) ] .
Remark: A superficial examination of the problem may suggest that instead of (4)
the probability Qm,o(Pi, P2) equals
(5) (1 - 0)[P„a(Pi, P2) + Qm,o(T^, P2)] .
However, the reader will easily satisfy himself that the presumption is false because
it does not take into account the circumstance that with the moment of a fatal
accident the individual I ceases to be exposed to accidents which might have other¬
wise occurred after this accident.
Postulate P5. At least at P2 = Pi, the probabilities Pm,n(P], P2) and Qm,n(P], P2)
are differentiable with respect to P2 and, specifically,
/n\ 6P m, 0 (Pi , P 2) — ^
dP2 y, ^ 1 + l^Pl ^
where X, 11 and v are nonnegative constants, and
(7)
bPm, n (Pi;
dT2
P2)
BQm. n-1 (Tl, P2)
dTi
— 0, for n ^ 2
It will be observed that with X > 0, m > 0 and > 0 equation (6) implies the
contagion and the time effect. Pm,o(Pi, P2) represents the probability of avoiding
accidents in (Pi, P2). When P2 = Pi, then, according to (2), this probability is unity.
Equation (6) implies that, with the increase of P2 the speed of falling off in this
probability is increased with the increase of m and is decreased with the increase
258
University of California Publications in Statistics
of Tu Equations (7) imply that with T2 Ti, the probability of more than one
accident in (Ti, T<f) decreases faster than the difference — T^. Also if ju = 0, then
there is no contagion. If = 0, then there is no time effect.
The reader will notice that (4) and (6) imply
/Q\ dQm, o(Ti, Tf)] 1 “h ^ni
~~~^2 ^
and that then (1) and (7) imply
/Qx 1 (Ti, Tf) I ^
^ ^ I TT^ •
Polyaks original scheme was considered as a limiting case of a system of drawings
from an urn and this led to the assumption /z = y. Also, in the original scheme of
Polya d — 1, so that there is no room for the probabilities Qm,n(T], Tf).
As mentioned, the combination of postulates Pi through P5 will be denoted by
(P) and described as the generalized Polya contagious scheme. This scheme will be
contrasted with another scheme to be denoted by {N) (connoting Newbold) which
consists of postulates Pi through P5 supplemented by Pe and P7 as follows :
Postulate Pe. Contagion and time effect are absent j so that fx — v = 0.
Postulate P7. The parameter X in (6), (8) and (9) is a particular value of a random
variable A with the probability density function
(10) p/x) = ^ X“-' e-'*" for 0 < X ,
where a and p are arbitrary positive numbers.
It will be seen that except for the probability B of surviving an accident, model
{N) coincides with the original mixture-no contagion-no time-effect model of
Greenwood, Yule, and Newbold.
Most of the study given below will refer to model (P) and it appears unnecessary
to complicate the formulae with constant explicit references to this model. Refer¬
ences to the two models in the form of letters P or N behind a vertical bar will
appear only in cases when there may be a misunderstanding.
3. Problem studied. Considering model (P) we shall visualize s + 1 consecutive
periods of time, the ^th period beginning with U^i ^ 0 and ending with ti>ti^ij where
^ = 0 and 4+1 = + “ . As before, these periods of time will be open on the left and
closed on the right. With these periods of time we shall associate s + 2 random
variables.
The random variable Z is defined as the number of complete periods of time sur¬
vived by the individual /. Thus if Z = 0, then the individual I meets with a fatal
accident in (0, 4), etc. If Z = s + 1 then the individual I survives up to 4+i = + oo .
Obviously Z is capable of assuming integer values from zero to s + 1.
With each interval (4-i, 4), where f = 1, 2, * • • , s + 1, we associate a random var¬
iable Xi defined as the number of accidents incurred after the moment 4-1 and up to
and including 4, which the individual I will survive.
Bates-Neyman: Accident Proneness. II
259
The variables X* and Z are interdependent. Denote by k the value assumed by Z.
If i ^ k then X* equals the number of accidents incurred by I in all of which
are survived. If i ^ k + I, then X* equals one less than the number of accidents in¬
curred by I in (4, 4+i), the last of these accidents being necessarily fatal. Finally, if
^ > /c + 1, then Xi — 0.
Our problem is to deduce the joint probability generating function of the variables
Z, Xi, X2, ■ • ^X.+i.
Whatever be the random variables Fi, F2, • • •, F^ capable of assuming nonnega¬
tive integer values and whatever be the hypotheses we shall use the generic
symbol
(11) Gy^ . y^ {ux, W2, • • ‘,Ur \ H)
= E S ■ • • E P{iYi = fci) = k,) - ■ ■ {Yr = kr) I H]
^1=0 A:2=0 kj.=0
to denote the conditional probability generating function of Fi, F2, • • •, Fr, given
the hypothesis H. Here the argument Ui corresponds to the variable F* and it is as¬
sumed that I Wi| ^ 1, for ^ = 1, 2, • • r. For the variables Z, Xi, • • •, Xs+i, the
argument of the probability generating function which corresponds to Z will be de¬
noted by V and the argument corresponding to X* by ^ = 1, 2, • • • , s + 1. With
this notation, the object of our study is the generating function
(12) . (y, Ml, M2, • • •, m.h-i 1 P)
implied by model (P), its particular case
(13) ^ Z.X^.X^, . . . ,-X'a+i ’ * 'j ^s+1 [ [m = 0] P)
and the counterpart of (12) implied by mixture-no contagion-no time-effect model
(N). Obviously (13) is a function of X and we have
(14) . X.+ . (*'>'“!> “2l ■ ■ 'jWs + ll-^)
Gzjs: . («'> • • •> I [m = >' = 0]P)p^(X) dk .
In the following we shall have occasion to use the fundamental relation between
the absolute and the conditional expectation, familiar for a long time, but first
rigorously established by Kolmogoroff [6]. Let Fi, F2, • • •, Fr be any random
variables and let / (2/1, 2/2, * * * , 2/r) be any Borel measurable function of real argu¬
ments yi, 2/2, ■ • • , 2/r. Then
(15) P[/(Fi, F2, ■ • •, Fr)] = P{P[/(Fx, F2, • • •, Fr)lFx, F2, • • •, Fr_l]} .
260 University of California Publications in Statistics
4, Preliminary formulae. Applying (15) we may write
(16) G
{Vy Uly W2, * • • , Us+l) = E
s+ 1
= E
s+ 1
v^E ( n ’■ I z
1=1
= . I ^ ^ m)P{Z = m}
m=0
Since Z = m implies = 0 for i > m + 1, the conditional probability generating
function on the right of all the s + 1 variables Xi, X2, • • • , -X^s+i reduces to
(17) . (wi> • • •,««.! I z = m)
Our first step, then, will be to provide means for computing the probabilities
P{Z = m) and the conditional probability generating functions (17). For this pur¬
pose we return to the probabilities P^ATi, T2) and Qm,niTi, T2) introduced in section
2. Multiplying them by ■u” and summing for n from zero to infinity, we get, say
00
(18)
gm{Ti, T2, u) =
23 «" Pm.n(7’l, Ti)
and
n=0
(19)
Am(7’i, T2, m) =
53 'W” Qm,n{T ly T2)
n—0
For I w 1 ^ 1 both series converge and determine Qm and hm as functions of w which
are differentiable for |ii| <1. In many instances below the value of u will be imma¬
terial and in these cases, to simplify the notation, we shall omit u from the symbol of
the two functions. Also, whenever there is no danger of misunderstanding, we shall
occasionally omit all three arguments and write simply and hm for the left-hand
sides of (18) and (19).
In order to determine the functions g and h we proceed in the familiar manner
[4 and 5] and write down the relation between Pm,n(Ti, T2) and Pm,n{Ti, T2 + r)
where r > 0. We have
(20) Pm, o(El, T2 + r) — Pm,o{Ti, T2)Pm,o{T2f T2 + r)
and
(21) Pm,n{Tl, T2+ r) = Pm,n{Tu T 2)P m.n ,^{T 2, T 2 + t)
= Pm,n-l{T,y T2)Pm.n^l,l{T2, T 2 + t) + o{t)
for n > 0, where, owing to (7), o(r) decreases when r 0 and the rate of decrease is
faster than that of r. Subtracting Pm,(i{Tiy T2) from both sides of (20) and Pm,n{TiyT2)
261
Baies-Neyman: Accident Proneness, II
from both sides of (21), dividing the results by r, passing to the limit as r ^ 0, and
using (6) and (9) we obtain
(22)
dPm,o{Tly
dT2
T2) > ^ "1” T
~ ^ 1 + xT/
and
(23)
11
^ 1 + fim + fin p
1 + vTi '
.o(Ti, T,)
iTu T,)
+ ex ^ - ^P«,„-i(Ti, T,) .
1 yj- 2
Now we multiply (23) by tt”, sum for n from unity to + oo , add to (22), use (18)
and obtain, after some easy algebra.
(24) (1 + VT2) ||r + XM^i(l — = — X(1 + ixm) (1 -- 6u)gm .
Using the familiar methods, the general solution of this partial differential equa¬
tion is easily found to be
(25) ^(^2)]
where, to simplify the formula 7 = l//x and, generally,
(26) A{T) = (1 + .
Here f(x) stands for an arbitrary differentiable function of the argument x. This
function must be so selected that (25) coincide with gr„,(Ti, T2). For this purpose we
notice that the substitution T2 = Ti gives g?m(Ti, Ti) ^ 1 identically in u and Ti.
Making this substitution in (25) and equating the result to unity we obtain the
condition determining the function f{x),
(27) / MTO] ^ •
Now substitute
(28) ----- A{Ti) = X
and solve for u
(29)
_A(^
x+ OAiTO •
Substituting (29) into (27) we have
fix) =
/ AH\) Y
\x + eAiTi)J
(30)
262 University of California Publications in Statistics
Now the function f{x) is determined. In order to obtain we substitute (30)
into (25). Easy algebra gives
(31) gmiT
1, Ti, u) = ^
A{T,)
eA{T,) + (1 - e)A{Ti) + e[A{Tij - act
)y+m
= [D{Tu T,, , say.
Now we turn to the function hm{T\y T^y u) generating the probabilities Qm.n(^i, 7^2).
Using the same method we write
(32) Qm,n(Tij 7^2 + r) = Qm,n(T]j T2) + P m,n{T T^) Q,7i+n,o(7^2, 7^2 + r) + o(t)
and it follows
(33)
(7^1 j 7^2) /'I ^\-y 1 “h “b f^n p ^ ^
W2 ^ Prn,n{T,yT2) .
Multiplying this result by summing for n from zero to infinity and using (18)
and (19) we obtain
(34)
dT2
(1
- e)x Y
+ fim
+ vT,
Qm +
flU
1 + vl
_ ^gm
\ du^
The explicit expression of the derivative of hm in (34) is obtained using (31). Since
at T2 = T\y the value of hm must be zero identically in u, an easy integration gives
(35) T„ u) = [ 1 - T,) ]
where gm is given by (31).
Formulae (31) and (35) play a basic role in our further study. We begin by using
(31) to evaluate the frequency function of the random variable Z.
5. Probability of surviving exactly j complete periods of observation. Referring
to the definition of the function gm{Ti, T2, u) and of the probabilities Pm.n(Ti, T2), it
is easy to see that gmiT^y T2, 1) represents the conditional probability that the indi¬
vidual I will survive at least up to and including T2, given that he was alive at Ti
and that up to the moment Ti he sustained exactly m accidents. In particular, we
obtain from (31)
(36) ^o(0, T, 1) = [0 + A{T) (1 - e)r = , say,
for the probability that the individual /, alive at i = 0 will survive at least up to and
including an arbitrary moment T ^ 0.
Now return to the random variable Z defined as the exact number of complete
time intervals (^t_i, U) which the individual /, alive at time zero, will survive. What¬
ever the nonnegative integer it is obvious that
263
Bates-Neyman: Accident Proneness. II
where, to simplify the notation, Aj = A(tj). It will be noticed that the conventional
definition of = + oo implies As+i = + oo and, therefore P{Z s + 1] =0. Now,
the probability that I will survive exactly j complete intervals ti) is
(38) P{Z=j}=P{Z^j}-P{Z^j + l]
— [0 Aj(l — ^) ] — [0 Ajj^i{l — B)] ^
iovj = 0, 1, 2, • • • , 5; while
(39) p{Z = s+l)=0.
6. Probability generating function of Z, Xi, X2, • • *, X^+i implied by model (P).
In order to deduce the expression for the probability generating function desired we
first establish a convenient recurrence formula.
Let uij W2, ■ • • , Uj be any nonnegative integers. Define /So = 0 and generally
(40) Si = i: n* .
Then the product
(41) n Ps,.,.n, (<••--
represents the probability that the individual J, alive at time zero, will survive at
least up to and including tj, and that in the interval (ii-j, U), with f = 1, 2, 3, • • * , y,
he will survive exactly n* accidents. It follows that, by dividing (41) by P{Z j] we
shall obtain the conditional probability of the compound event
(42) (Xx = Til) (X2 = ^2) • • • (X, = n,)
given that the individual I survives up to and including tj. Thus
(43) . ^2? • * ' ) Uj \ Z j) = n
where ^ symbolizes j-f old summation for Ui, ^2, • • • , Uj, each from zero to infinity.
Referring to (18) and (31) we see that the last of these summations gives
^ ( _ Aj.i _ \y+S:-i
\0^y_i+(l — 6)Aj-\- B{A j— A —Uj)/
= tj, W,y)]Y+^i-i .
264
University of California Pnhlications in Statistics
We have then, in particular,
(45)
7 > ,-1^ go(0, h, Ui) ^ go(0, <1, Ui)
^ P{Z^l] g,{Q,h,\)
6{A\ — Ai))
8 + {I - e)Ai
Returning now to (43), if we multiply both sides of this equation by P{Z'^j}
and use (44), we obtain
(46) P{Z ^ j]G
» t Vj-
(Ml, M2,
Z^J)
= [D(tj-i, tj, Uj)Y
3-1
n (uiDy‘ Ps._„ 1, td
where, for short, D = tj, uj) and the summation extends over all combinations
of values of ni,rh, • • • , n,_] from zero to infinity.
It is easily seen from (43) that the expression in curved brackets in (46) is equal to
(47) P{Z ^ i - . X,.. ■ • -lUj-iD I Z ^ i - 1) .
This establishes the recurrence formula sought, namely
(48) . X, (^i> '“2, • • •, My I Z ^ j)
. Xj., (“i^> • • •, My-iZ)|Z^i-l) .
Using this formula and (37), (45) we easily obtain
(49) (wi. M2 I Z ^ 2)
= ^ Q ((^1 ~ -do)!! — Ml) + (d.2 — d.i)(l — M2)]|
and generally, by induction
. ('Ml, U2y • • •, Uj \ Z ^ j)
1 +
d
e + (1 ~ d)A,-
E(di
i=l
4i-l) (1
(50)
Bates-Neyman: Accident Proneness. II 265
It is seen that the conditional distribution of Xi, X2, • • •, Xj, given that Z j is
always a j-variate negative binomial. We propose to call it the generalized Polya
multivariate distribution.
Now we can use the same method to compute the conditional distribution of
given that Z — j. To do so we turn our attention to (41) and notice
that, if this product is multiplied by Qsi,nui(^h ^?+i) then the result will equal the prob¬
ability that the individual /, alive at zero time, will sustain and survive exactly Ui
accidents in U) for ^ = 1, 2, • ■ • ,i + 1 and that he will perish at the (n;+i+l)st
accident between tj and tj+i. It follows that
(51) P{Z = i ’ * *7 \ ^ — j)
i=i «;>1=0
where the first sum extends over all values of ni, ^2, • • • , Uj from zero to infinity.
However, referring to (19) and (35), we see that the last sum coincides with
(52) {tjy 'Wj+i) — ji^ __ ^ [1 Qs
[1 {tjj tj+if wy+i)] .
Substituting this result into (51), we have
(53) P{Z = j}G
^1.^2 .
(uu Ui, • • •, Uj+] \ Z = j)
1 -
22 IT ^ Sj +
Referring again to (43) and (44), we obtain easily
(54) P{Z = . ^2, • • *, wy+i i Z = j)
1 - d
1 — dUj+i
P{Z^j}G
X,.X, . X
. (ui,U2, ■ ■ ■ ,Uj\Z^j)
p{z^j+m
X,, . . . ,Xy + i
(lilf • * Wj+1 Z^j+1)
which determines the generating function for the conditional distribution of
Xi, X2, • • • , Xj+i given that Z = j, forj =1,2,- • • , s.
266 University of California Publications in Statistics
Substituting the explicit expressions (50) for the generating functions on the right
of (54) and using (37) and (38) we obtain
(5o) P {Z =j] . . .Xy+i ^2, * * *j nj+i I Z = j)
where, with the convention uq ^ 1 this formula is valid for j = 0, 1, 2, • * • , 5.
Finally, we have
(56) G,
Z,Xi,X2 . X,+1 , Us+l)
‘S’’ ([• + <1 - f (A, - ^,-0 (I - «,)]
[
y+1
0 + (1 - e)Ai^i + 9 'EUi- 4._i) (1 - w
i=0
7. Probability generating function of Z, Xi, X2, • • •, X^+i implied by model (N).
In the present section we use the results obtained to compute (14). For this purpose
it will be sufficient to evaluate the limit
(57) lim P{Z = . . , ,Xy+i (^1? U2, • * •, Uj+i \ Z = j)
= Fji\, Ml, M2, • • •, My+i; , say ,
and to perform the integration
(58) j Fj{\,ui,Ui, • • • ,Uj+{ypJS)dX = Fj*{ui,Ui, ■ ■ -jMy+i), say.
Then
s
(59) ^Z,Xi,X2, . . . ^2; • • •,'Ws + i]iV) — ^2 ^ Fy*(wi, U2, * ’ *, Uj+i) .
j=o
To evaluate the limit in (57) we consider first the expression
d + a ~ d)Aj + e i (^^ - A,_i) (1 - "
i=0
(60)
= Bj , say .
Bates-N eyman: Accident Froneness. II
We have, recalling the definition of Aj in (26),
(61)
lim Bj ~ ^ + (1
L
(62)
lim Bj — exp 1 — X
/u-->0
/ — X (1 — 6)tj + 6 ^2 ~ i'i-l) (f ~ f •
^ L- i=n -* ^
It follows, then, from (56) that
(63) Fj(\,ui,un^ • • ',Uj+i)
1 — Ouj+i
X [(1 ~ 0)tj + 0 E (1 ”
— exp
Easy integration gives
(64) Ff(Ui,U2, • ■ ’,Wy+i)
X (1 — 0)tj + l -f- 0 ^ (ti ti~
^ 1=0
l) (1 - M.) jV
= ^ \ {[i + (1 - + r'e Z «.• - ti-i) (1 - Wi)]
- [i + (1 - + r'e Z Hi - ii-i) (1 - Wi)] } •
Substituting this expression for F* in the right hand side of (59) we get the desired
generating function.
It will be noticed from (59) that substituting into Fj* unity for each of its argu¬
ments, we obtain the probability that Z = ^ as implied by model (N). Thus
(65) P{Z = j\N} = [! + (!- d)p-%]-^ ^ [1 + (1 -
and with the convention = + «» , it follows that
(66) PlZ^j\N} ==[! + (!- 0)r%]-“.
Dividing (64) by (65) we obtain a formula determining the conditional proba¬
bility generating function of Xi, X2, • • • , Xj+i, given that in the interval (^/, i,*+i)
the individual I meets with a fatal accident,
268
(67) P{Z^j\N]
1 - d
University of California Publications in Statistics
(uij • • •, Uj+i I {Z = j), N)
. Xj,
f
1 - 9u
y+1
i=0
y+i
where again we adopt the convention Wo = 1 so that (67) is valid forj = 1, 2, • • • , s.
When comparing the models (N) and (P), formula (67) should be compared with
(55). Both generate probabilities of the various combinations of values of the Xi,
X2, • • • , Xj and Xj+i subject to the restriction that in (iy, ^y+i) the individual I meets
with a fatal accident. In order to obtain for the model (N) the counterpart of for¬
mula (50) we notice that
(68)
2: lit, ■ ■ -jM/t + i)
k-l
= P{z ^ .
Upon dividing by (66), we obtain
(69) . x, («i> • • • > I (^ = i)>
r
1 +
(mi, Ut, ■ ■ ■, Uj I {Z ^ j), AO .
1 + (1 - e)r‘0 S ^ “
8. Comparison between the distributions implied by models (P) and {N). The
comparison between the implications of models (P) and {N) made thus far, [3, 4,
and 5], refer to the distributions of Xj with 9=1. Using (50) and (69), we have
(70) I {9 = 1), P) = [1 + (Ai - 1) (1 - ^^0]
and
(71) (?^^(mi ! (0 = 1), AO = [1 + r‘*!i(i - Ml)]-' •
It is seen that, with a = y and ^~Hi = {Ax — 1), the two distributions coincide so
that no amount of empirical data regarding Xx alone can afford means of distinguish¬
ing between the two models.
The above comparison is not entirely relevant, since it is frequently impracticable
to ascertain the number of light accidents which the individuals of a population may
have incurred prior to the period of observation. For this reason it is doubtful
whether one could ever obtain data which could serve as an empirical counterpart
269
Bates-Neyman: Accident Proneness. II
of the distribution generated by (71). The most one can hope to obtain is data
regarding individuals who were exposed to unobserved accidents for approximately
the same period of time, perhaps for a long time ti, and then were subjected to
observation during one or more subsequent periods (^i, 12), (4, 4), etc. Such, for
example, is true of the data on the London bus drivers discussed in Part I [1].
Before being employed by the London Transport Board, these 166 persons were
experienced drivers and many of them must have had quite a few accidents which
are not in the records. However, the time ti that elapsed between the obtaining of a
driver’s license and the beginning of the employment in London could probably be
established with reasonable accuracy. Then, the statistics compiled for those drivers
for whom tj has the same value could be used as a counterpart of the theoretical
distributions of the random variables X2, X3, • • • . We will compare these distribu¬
tions for the two models (P) and (iV), more generally, assuming that to each acci¬
dent there corresponds a fixed probability d of survival.
Consider first, then, formulas (50) and (69), with Ui = 1. We have
(72) = 1 + dCj
*’*'*’' L i=2
and
(73) . ■ ■ ■,Ui\iZ^j),N) = 1+^^9C*
i“2
where, for simplicity
(74) Cj=[e+{1- e)A^]-^ and Cf = [ 1 + (1 ~ .
It is seen that, if the observations are limited to one period only, e.g,, from to
then the distributions implied by the two models are single-variate negative bino¬
mials with two parameters each and are indistinguishable.
However, if the observations refer to two or more equal consecutive intervals, say
(ti, ti+i = ti + 1) for f = 1, 2, • • • , y — 1, then the situation is changed considerably.
The coefficients of the binomials (1 — Ui) in (73) are all equal to
(75)
On the other hand, in (72) the coefficient of the binomial I — Ui is
(76) eCjiAi - A,_i)
= eCi {[1 + vti + v{i - 1)]^"^'''' - [1 + vh + Ki - 2)] .
In order that the two distributions be forced to coincide by an appropriate choice
of the parameters, it is necessary and sufficient that (76) be independent of i, that
is to say, that
(77)
= V ,
270
University of California Publications in Statistics
In other words, in comparing the joint distributions in models (P) and (N) of
accidents survived in two or more equal consecutive periods up to and including,
say, tj, for those individuals who are known to be alive at tj, we find that these joint
distributions coincide if and only if condition (77) is met by the parameters X, ju, v
of model (P). A comparison of formula (55) with (67), and (38) with (65) makes
clear that the same conclusion is reached when one compares the joint distributions
for those who are known to have succumbed to a fatal accident in the kih. period,
say, with = 1, 2, • • ■ , j, or when one compares the two distributions of the num¬
ber of complete periods survived.
In principle, of course, this equality may be satisfied, and then the two schemes
(P) and {N) will be indistinguishable no matter how many variables X2, X3, • • • , X/,
we observe. However, the satisfaction of the equality (77) is most unlikely, and then
the multivariate distribution implied by the model (P) will be different from that
implied by model (X). At any rate, should the empirical distribution of X2, X3, • • • ,
Xj indicate the inequality of the coefficients of the binomials (1 “ uf) then this is an
indication in favor of the model (P) rather than the model (X).
In the last section of the present paper we study the possibility of identifying the
nature of contagion (^^true” or ''false’O using a different set of observable random
variables.
9. Joint distribution of the number of light and the number of severe accidents.
In this section we use some of the formulae given above in order to deduce the joint
distributions of the number Y of light accidents incurred in one period of time of
unit length and the number X of severe accidents incurred in a subsequent period
of time, also of unit length, as implied by the Greenwood-Yule-Newbold model
supplemented by the fundamental hypothesis and by the assumption that to each
severe accident there corresponds a fixed probability d of surviving it. The formulae
deduced here are given without proof in Part I of the present paper.
It will be realized that the process of determining the joint distribution of X and Y
is exactly similar to that of section 7.
The hypotheses assumed imply that for a given individual in the population, with
a fixed proneness X to severe accidents, the variables X and Y are mutually independ¬
ent with the probability generating function of Y given by
(78) G^{v I X) =
where A is the modulus of frequency of the light accidents.
As to the variable X, we shall identify it with Xi of the set of random variables
discussed in section 7. Substituting u for Ui, i for ti and unity for Ui with f > 1 in
formula (63) we have
(79) P{Z = 0)G^(m i {Z = 0), N) = Fo(X, u) = .
Similarly, we have from (63)
(80) P{Z ^ l\G^{u\{,Z ^l),N)
s
= 12FA\'U,U2, ■ ■
i=l
271
Bates-N eyman: Accident Proneness, II
Multiplying (79) by (78) we obtain, say
(81) $o(X, u, ?;) = ] ~ f I
I — 6u {
and similarly, multiplying (80) by (78),
(82) $,(X, u, v) = .
Multiplying (81) and (82) by the probability density function of A and integrating
for X from zero to infinity we obtain, say
(83) $*(«, v) = j— ^ ^«{[^+4(l_„)]-«_[0+l_e+0(l-^^) + A(l-^;)]-“)
and
(84) 4>*(z^, v) = + 1 - ^ + 0(1 - ^i) + A(1 - v)]--
respectively. Now, exactly as in section 7,
(85) y(u,vl(Z = 0), N) =
and
1 - e [g + ^(1 - r))-» - \0 + 1 - e + eg - u) + A(i - a)]--
1 - 9u ~ /3-“ - (d + 1 - 0)-»
(86) =
-[
|8 + 1 - 0
(d + 1 - e) + e(i - m) + A(1 - V)
rl
Formulae (85) and (86) are the two probability generating functions sought. The
probabilities generated by (85) form the theoretical counterpart of the accident
statistics for those individuals of the population who meet with a fatal accident. The
probabilities generated by (86) correspond to the frequency distribution of accidents
compiled for those who survive the second period of observation.
10. More hopeful approach to the problem of distinguishing between models (P)
and (N). As was shown in section 8, the joint distribution of the numbers of acci¬
dents survived in several consecutive periods of observation implied by model (P)
coincides with that implied by model (N) only in the improbable particular case
when Xju = v. Thus, given a substantial number of observations of the simultaneous
values of the random variables Xi, X2, • • • , X„ it is possible to subject to a test the
272
University of California Publications in Statistics
hypothesis that the accidents considered are noncontagious and/or that there is no
time effect. Details of such tests must be relegated to a separate paper. However, the
authors anticipate that the power of the test contemplated will not be a very satis¬
factory one. On the other hand, it seems plausible that the power of a test of the
same hypotheses may be much better if these hypotheses are tested on the obser¬
vations, not of the variables Xi, X2, ■ • • , X^ considered above, but of time intervals
between successive accidents incurred by particular individuals. A complete study
of this problem also requires more space than can be given in the present paper.
However, it seems appropriate to include the present section outlining the new
approach in relation to a small section of the statistical data that one may expect to
have available, namely, in relation to those individuals who, during the period of
observation, sustain exactly one accident. In this outline we shall assume 6 = 1. On
the other hand, it appears possible to liberalize a little the original scheme of Polya
by not insisting that the dependence of the derivatives (6) on the number m of
previous accidents is necessarily linear and by admitting that there may be a varia¬
tion in accident proneness from one individual of the population to the next.
Consider an individual / and assume that from the moment ^ = 0 on, he is exposed
to risk of accidents of a particular kind. For this individual we shall consider proba¬
bilities Pm,n{Th 7^2), defined in section 2, and shall assume that these probabilities
depend on the number m of accidents sustained up to and including moment Tj and
also on the value of Ti but not on the precise moments when the previous accident
occurred. Specifically, we shall assume that
(87)
dPm,n{Tiy T2)
dTi
- ^
7 2=^1
1 + vTi
if n = 0
, if n — 1
1 +vTi
0 , if n > 1
where Xo, Xi, * * * , X^, * * * are arbitrary nonnegative numbers and p is subject to the
restriction that for values of t limited to the period of observation 1 + vt > 0.
Following the usual procedure, it is easy to find that
(88)
PmATi, T2)
XmVl'
and, using the assumption that X^ X^+i j
(89) PmATi, T2)
(1 + vT,\
/ \ Xm+l/j' -1
( 1 + pTA
_ \1 + PT2/
\i + vtJ J
Obviously, Pm.o(7'i, T2) is a decreasing function of X^. Thus, if all the X^s have the
same value, the model implies the absence of contagion in the accidents. If the X^s
increase,
(90)
Xq Xi * * * "^ Xm Xm-(-3 * * * ,
273
Bates-Neyman: Accident Proneness, II
then we shall speak of “regular positive’’ contagion, meaning that the more acci¬
dents the individual had in the past, the more intense is the i;isk of accidents in the
future. If the X’s decrease,
(91) Xo > X] > • • • !> Xrrt !> Xot+] !> * • • ,
then we shall speak of “regular negative” contagion. This would be the case where
previous accidents “teach” the individual how to avoid accidents in the future.
Finally there is the possibility of the sequence of the X’s being nonmonotone. In
this case we shall speak of “irregular contagion.”
Now we shall assume that the individual I is observed for a unit of time, from Ti
to Tj + 1. The object of this section is to deduce the conditional distribution of the
random variable r defined as the time between the beginning of the period of obser¬
vation Ti and the moment when the individual I sustains an accident, given that
up to and including moment T\ the individual sustained exactly m accidents and
given that between Tj and Ti + 1 he sustains exactly one accident. Obviously
0 < T ^ 1.
For this purpose we compute the conditional probability, given exactly m acci¬
dents up to moment Ti, of the simultaneous occurrence of two events. One event is
that during the period of observation the individual I sustains exactly one accident.
The second event consists in r not exceeding an arbitrary positive number ^ ^ 1.
The conditional probability just defined coincides with the conditional probability,
given exactly m accidents up to moment Ti, that between Ti and + t the indi¬
vidual will have exactly one accident and that between Ti + t and Ti + 1 he will
have no accidents. Thus, this probability is easy to obtain from (88) and (89),
(92) P„i,i(Ti, T\ + t) I Ti + 1)
J
Xm+l — l(0^ +
(^)
\a + v)
Xm+ l/ V
Where, for the sake of simplicity
(93) +
Dividing (92) by the conditional probability of exactly one accident between T\
and Ti + 1, we obtain the conditional probability of r ^ i, given that up to Tj the
individual I had exactly m accidents and that in (Tj, Ti + 1) he sustains exactly one
of them. This probability is, then, the conditional distribution function of r, say
Ft (t I v)
Pm,\{Ti, Ti -|- t) Pm + l.Q(Tl + Tl + 1)
P«..i(Ti, Pi + 1)
1
1
(94)
274
TJniversity of California PuMications in Statistics
where, for the sake of brevity, = Xm+i — Xm. The corresponding probability
density function is, say
(95)
I V) =
We are particularly interested in the following three special cases obtainable from
(95) by simple passages to the limit.
i) If ^|/m = 0, but V is unspecified, then the mth accident is ^^noncontagious’’ and
we have, say
(96) [t I {ipm = 0), v] = - - — - - — .
(a + ut) log + -j
li) If = 0 but y}/m is unspecified, then there is no time effect but there may be
contagion and
(97) vj.t i (^ = 0)] = .
Hi) Finally, when both = 0 and v = 0, then we have the no contagion-no time-
effect case and
(98) p, [t I (^„ = 0), (. = 0)] = 1 .
All five formulae (94) through (98) are given for 0 < i ^ 1. They refer to a par¬
ticular individual with a fixed number m of previous accidents and with fixed and
it will be seen that their use is likely to give a substantial insight into the mechanism
of accident proneness. The particularly interesting point is that, at least in some
respects, the effect of variation from one individual of the population to another is
now divided from contagion and time effect. Thus, if accident proneness conforms
exactly with the no contagion-no time-effect mixture model of Greenwood, Yule,
and Newbold, whether including the particular postulated distribution function of A
or not, then the time intervals r observed for arbitrary individuals of the population
will be uniformly distributed between zero and unity as implied by (98). Any de¬
parture from this distribution is, then, an evidence of either time effect or contagion
or both. Furthermore, the distribution (95) applicable in the general case coincides
with (98) only when ypm = v.
The identity, with respect to any characteristic, of all individuals of a living
population is always rather improbable. In particular it may be taken for granted
that m will vary from one individual to another. Consequently, if by and large the
accidents studied are subject to contagion and/or to time effect, it is most likely
that at least for some individuals of the population the equality = v will not be
satisfied and that, therefore, the study of the empirical distribution of r will indicate
the true nature of the machinery of accident proneness. This is particularly probable
275
Bates-Neyman: Accident Proneness. II
in the two “regular” cases, in which the sequence of the lambdas is monotone.
However, it may be hoped that a study of time intervals for individuals incurring
two or more accidents during the period of observations will throw some light also
on the irregular case in which owing to the variation in m, is positive for some
individuals of the population and negative for others.
REFERENCES
[1] Grace E. Bates and Jerzy Neyman, “Contributions to the theory of accident proneness.
I. An optimistic model of the correlation between light and severe accidents.” Univ.
Calif. Publ. Statist., Vol. 1, pp. 215-254.
[2] G. P6lya, “Sur quelques points de la th4orie des probabilites,” Ann. de VInstitut Henri Poin¬
care, Vol. 1 (1930), pp. 117-161.
[3] OvE Lundberg, On random processes and their application to sickness and accident insurance.
Uppsala, Almquist and Wiksells, 1940. 172 pp.
[4] W. Feller, “On the theory of stochastic processes, with particular reference to applications,”
Proceedings, Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, Uni¬
versity of California Press, 1949, pp. 403-432.
[5] W. Feller, An introduction to probability theory and its applications. New York, Wiley, 1950.
xii + 419 pp.
[6] A. Kolmogoroff, Grundbegriffe der Wahrscheinlichkeitsrechnung. Berlin, Julius Springer, 1933.